39 research outputs found
A Bayesian Approach to Identify Bitcoin Users
Bitcoin is a digital currency and electronic payment system operating over a
peer-to-peer network on the Internet. One of its most important properties is
the high level of anonymity it provides for its users. The users are identified
by their Bitcoin addresses, which are random strings in the public records of
transactions, the blockchain. When a user initiates a Bitcoin-transaction, his
Bitcoin client program relays messages to other clients through the Bitcoin
network. Monitoring the propagation of these messages and analyzing them
carefully reveal hidden relations. In this paper, we develop a mathematical
model using a probabilistic approach to link Bitcoin addresses and transactions
to the originator IP address. To utilize our model, we carried out experiments
by installing more than a hundred modified Bitcoin clients distributed in the
network to observe as many messages as possible. During a two month observation
period we were able to identify several thousand Bitcoin clients and bind their
transactions to geographical locations
Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States
Recently, numerous approaches have emerged in the social sciences to exploit
the opportunities made possible by the vast amounts of data generated by online
social networks (OSNs). Having access to information about users on such a
scale opens up a range of possibilities, all without the limitations associated
with often slow and expensive paper-based polls. A question that remains to be
satisfactorily addressed, however, is how demography is represented in the OSN
content? Here, we study language use in the US using a corpus of text compiled
from over half a billion geo-tagged messages from the online microblogging
platform Twitter. Our intention is to reveal the most important spatial
patterns in language use in an unsupervised manner and relate them to
demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented
with the Robust Principal Component Analysis (RPCA) methodology. We find
spatially correlated patterns that can be interpreted based on the words
associated with them. The main language features can be related to slang use,
urbanization, travel, religion and ethnicity, the patterns of which are shown
to correlate plausibly with traditional census data. Our findings thus validate
the concept of demography being represented in OSN language use and show that
the traits observed are inherently present in the word frequencies without any
previous assumptions about the dataset. Thus, they could form the basis of
further research focusing on the evaluation of demographic data estimation from
other big data sources, or on the dynamical processes that result in the
patterns found here
The rich still get richer: Empirical comparison of preferential attachment via linking statistics in Bitcoin and Ethereum
Bitcoin and Ethereum transactions present one of the largest real-world
complex networks that are publicly available for study, including a detailed
picture of their time evolution. As such, they have received a considerable
amount of attention from the network science community, beside analysis from an
economic or cryptography perspective. Among these studies, in an analysis on
the early instance of the Bitcoin network, we have shown the clear presence of
the preferential attachment, or "rich-get-richer" phenomenon. Now, we revisit
this question, using a recent version of the Bitcoin network that has grown
almost 100-fold since our original analysis. Furthermore, we additionally carry
out a comparison with Ethereum, the second most important cryptocurrency. Our
results show that preferential attachment continues to be a key factor in the
evolution of both the Bitcoin and Ethereum transactoin networks. To facilitate
further analysis, we publish a recent version of both transaction networks, and
an efficient software implementation that is able to evaluate linking
statistics necessary for learn about preferential attachment on networks with
several hundred million edges